Privacy preserving deep learning is an emerging field in machine learning that aims to mitigate the privacy risks in the use of deep neural networks. One such risk is training data extraction from language models that have been trained on datasets , which contain personal and privacy sensitive information. In our study, we investigate the extent of named entity memorization in fine-tuned BERT models. We use single-label text classification as representative downstream task and employ three different fine-tuning setups in our experiments, including one with Differentially Privacy (DP). We create a large number of text samples from the fine-tuned BERT models utilizing a custom sequential sampling strategy with two prompting strategies. We search in these samples for named entities and check if they are also present in the fine-tuning datasets. We experiment with two benchmark datasets in the domains of emails and blogs. We show that the application of DP has a huge effect on the text generation capabilities of BERT. Furthermore, we show that a fine-tuned BERT does not generate more named entities entities specific to the fine-tuning dataset than a BERT model that is pre-trained only. This suggests that BERT is unlikely to emit personal or privacy sensitive named entities. Overall, our results are important to understand to what extent BERT-based services are prone to training data extraction attacks.
translated by 谷歌翻译
We propose a neural network-based model for nonlinear dynamics in continuous time that can impose inductive biases on decay rates and/or frequencies. Inductive biases are helpful for training neural networks especially when training data are small. The proposed model is based on the Koopman operator theory, where the decay rate and frequency information is used by restricting the eigenvalues of the Koopman operator that describe linear evolution in a Koopman space. We use neural networks to find an appropriate Koopman space, which are trained by minimizing multi-step forecasting and backcasting errors using irregularly sampled time-series data. Experiments on various time-series datasets demonstrate that the proposed method achieves higher forecasting performance given a single short training sequence than the existing methods.
translated by 谷歌翻译
Large language models (LLMs) have shown impressive results across a variety of tasks while requiring little or no direct supervision. Further, there is mounting evidence that LLMs may have potential in information-seeking scenarios. We believe the ability of an LLM to attribute the text that it generates is likely to be crucial for both system developers and users in this setting. We propose and study Attributed QA as a key first step in the development of attributed LLMs. We develop a reproducable evaluation framework for the task, using human annotations as a gold standard and a correlated automatic metric that we show is suitable for development settings. We describe and benchmark a broad set of architectures for the task. Our contributions give some concrete answers to two key questions (How to measure attribution?, and How well do current state-of-the-art methods perform on attribution?), and give some hints as to how to address a third key question (How to build LLMs with attribution?).
translated by 谷歌翻译
在本文中,我们使用称为BSGD(块随机梯度下降)的非常通用的公式研究凸优化。在每次迭代中,有些但没有必要的参数所有组件都会更新。更新的方向可以是两种可能性之一:(i)使用一阶近似计算的噪声浪费的测量,或(ii)使用可能被噪声损坏的函数值计算的近似梯度。该公式包含大多数当前使用的随机梯度方法。我们基于随机近似理论,建立了BSGD收敛到全局最小值的条件。然后,我们通过数值实验来验证预测的收敛性。结果结果表明,当使用近似梯度时,BSGD会收敛,而基于动量的方法可能会差异。但是,不仅是我们的BSGD,还包括标准(全级别)梯度下降,以及各种基于动量的方法,即使有嘈杂的梯度也收敛。
translated by 谷歌翻译
基于A/B测试的政策评估引起了人们对数字营销的极大兴趣,但是在乘车平台(例如Uber和Didi)中的这种评估主要是由于其时间和/或空间依赖性实验的复杂结构而被很好地研究。 。本文的目的是在乘车平台中的政策评估中进行,目的是在平台的政策和换回设计下的感兴趣结果之间建立因果关系。我们提出了一个基于时间变化系数决策过程(VCDP)模型的新型潜在结果框架,以捕获时间依赖性实验中的动态治疗效果。我们通过将其分解为直接效应总和(DE)和间接效应(IE)来进一步表征平均治疗效应。我们为DE和IE制定了估计和推理程序。此外,我们提出了一个时空VCDP来处理时空依赖性实验。对于这两个VCDP模型,我们都建立了估计和推理程序的统计特性(例如弱收敛和渐近力)。我们进行广泛的模拟,以研究拟议估计和推理程序的有限样本性能。我们研究了VCDP模型如何帮助改善DIDI中各种派遣和处置政策的政策评估。
translated by 谷歌翻译
在本文中,我们引入用于计算各种3D数据集之间的中间表面的新方法,例如点云和/或离散表面。传统上,通过检测计算距离功能的奇点等中间表面来获得,例如脊,三个结等。它需要计算二阶差分特性,并且还必须应用某些种类的启发式。与此相反,我们确定中表面只是计算距离功能本身,这是一种快速简单的方法。我们存在并比较快速扫描方法,矢量距离变换算法,快速行进方法和Dijkstra-Pythagoras方法在查找3D数据集之间的中间表面时的结果。
translated by 谷歌翻译